Statistical Inference for Variable Importance

نویسنده

  • Mark J. van der Laan
چکیده

Many statistical problems involve the learning of an importance/effect of a variable for predicting an outcome of interest based on observing a sample of n independent and identically distributed observations on a list of input variables and an outcome. For example, though prediction/machine learning is, in principle, concerned with learning the optimal unknown mapping from input variables to an outcome from the data, the typical reported output is a list of importance measures for each input variable. The typical approach in prediction has been to learn the unknown optimal predictor from the data and derive, for each of the input variables, the variable importance from the obtained fit. In this article we propose a new approach which involves for each variable separately 1) carefully defining the wished variable importance as a real valued parameter, 2) deriving the efficient influence curve and thereby optimal estimating function for this parameter in the assumed (possibly nonparametric) model, and 3) develop a corresponding locally efficient estimator of this variable importance, obtained by substituting for the nuisance parameters in the optimal estimating function data adaptive estimators. We illustrate this methodology in the context of prediction, and obtain in this manner locally optimal estimators of marginal variable importance and covariate-adjusted variable importance, accompanied with p-values and statistical inference. We also propose a road map for statistical analysis based on this approach. Finally, we generalize this methodology to variable importance parameters for time-dependent variables.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining the importance of soil properties for clay dispersibility using artificial neural network and daptive neuro-fuzzy inference system

The main purpose of the current research is comparing the results of Artificial Neural Network (ANN) with Adaptive Neuro-Fuzzy Inference System (ANFIS) with regard to determination of the importance of soil properties affecting clay dispersibility. After taking samples from two depths of 0-40 and 40-80 cm, the spontaneous and mechanical dispersions of clay were recorded using both weighing and ...

متن کامل

Forecasting Industrial Production in Iran: A Comparative Study of Artificial Neural Networks and Adaptive Nero-Fuzzy Inference System

Forecasting industrial production is essential for efficient planning by managers. Although there are many statistical and mathematical methods for prediction, the use of intelligent algorithms with desirable features has made significant progress in recent years. The current study compared the accuracy of the Artificial Neural Networks (ANN) and Adaptive Nero-Fuzzy Inference System (ANFIS) app...

متن کامل

Thyroid disorder diagnosis based on Mamdani fuzzy inference system classifier

Introduction: Classification and prediction are two most important applications of statistical methods in the field of medicine. According to this note that the classical classification are provided due to the clinical symptom and  do not involve the use of specialized information and knowledge. Therefore, using a classifier that can combine all this information, is necessary. The aim of this s...

متن کامل

Inference for Neural Network Predictive Models with Impulse Interventions

Neural Networks (NN) have demonstrated remarkable time series fitting and prediction abilities, outperforming in several applications other methods and particularly linear models, such as dynamic linear regression. However, due to their nature, NNs are not easy to interpret and are often considered as black box models. The importance of each independent variable is hard to estimate and therefor...

متن کامل

Sample size determination for logistic regression

The problem of sample size estimation is important in medical applications, especially in cases of expensive measurements of immune biomarkers. This paper describes the problem of logistic regression analysis with the sample size determination algorithms, namely the methods of univariate statistics, logistics regression, cross-validation and Bayesian inference. The authors, treating the regr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016